Youtube Trending Video EDA

"Our objective in this exploratory data analysis is to gain insights into the videos that are currently popular on YouTube. We will analyze the trends and patterns of the videos that have been identified as 'trending' on the platform, examining factors such as view counts, publication dates, and channel affiliations. By exploring these data points, we hope to better understand what makes a video successful on YouTube and identify potential trends that can inform content creators and marketers."

Importing the required libraries

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.graph_objs as go
import plotly.express as px
import matplotlib.pyplot as plt
from plotly.offline import init_notebook_mode,iplot
init_notebook_mode(connected=False)
import seaborn as sns
import plotly.offline as pyo
from plotly.subplots import make_subplots
import warnings
# Ignore warnings
warnings.filterwarnings('ignore')

Loading the dataset into dataframe

df=pd.read_csv('CA_youtube_trending_data.csv')
df.head()
video_id title publishedAt channelId channelTitle categoryId trending_date tags view_count likes dislikes comment_count thumbnail_link comments_disabled ratings_disabled description
0 KX06ksuS6Xo Diljit Dosanjh: CLASH (Official) Music Video |... 2020-08-11T07:30:02Z UCZRdNleCgW-BGUJf-bbjzQg Diljit Dosanjh 10 2020-08-12T00:00:00Z clash diljit dosanjh|diljit dosanjh|diljit dos... 9140911 296541 6180 30059 https://i.ytimg.com/vi/KX06ksuS6Xo/default.jpg False False CLASH official music video performed by DILJIT...
1 J78aPJ3VyNs I left youtube for a month and THIS is what ha... 2020-08-11T16:34:06Z UCYzPXprvl5Y-Sf0g4vX-m6g jacksepticeye 24 2020-08-12T00:00:00Z jacksepticeye|funny|funny meme|memes|jacksepti... 2038853 353797 2628 40222 https://i.ytimg.com/vi/J78aPJ3VyNs/default.jpg False False I left youtube for a month and this is what ha...
2 M9Pmf9AB4Mo Apex Legends | Stories from the Outlands โ€“ โ€œTh... 2020-08-11T17:00:10Z UC0ZV6M2THA81QT9hrVWJG3A Apex Legends 20 2020-08-12T00:00:00Z Apex Legends|Apex Legends characters|new Apex ... 2381688 146740 2794 16549 https://i.ytimg.com/vi/M9Pmf9AB4Mo/default.jpg False False While running her own modding shop, Ramya Pare...
3 3C66w5Z0ixs I ASKED HER TO BE MY GIRLFRIEND... 2020-08-11T19:20:14Z UCvtRTOMP2TqYqu51xNrqAzg Brawadis 22 2020-08-12T00:00:00Z brawadis|prank|basketball|skits|ghost|funny vi... 1514614 156914 5857 35331 https://i.ytimg.com/vi/3C66w5Z0ixs/default.jpg False False SUBSCRIBE to BRAWADIS โ–ถ http://bit.ly/Subscrib...
4 VIUo6yapDbc Ultimate DIY Home Movie Theater for The LaBran... 2020-08-11T15:10:05Z UCDVPcEbVLQgLZX0Rt6jo34A Mr. Kate 26 2020-08-12T00:00:00Z The LaBrant Family|DIY|Interior Design|Makeove... 1123889 45803 964 2198 https://i.ytimg.com/vi/VIUo6yapDbc/default.jpg False False Transforming The LaBrant Family's empty white ...

#loading the category dataset into dataframe to extract the category names, as it is in json format
df1=pd.read_json('CA_category_id.json')
df1
kind etag items
0 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'IfW...
1 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': '5XG...
2 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'HCj...
3 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'ra8...
4 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': '7mq...
5 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': '0Z6...
6 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'K_-...
7 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'I3I...
8 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'D1W...
9 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'QME...
10 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'v2n...
11 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'Qi1...
12 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'IbG...
13 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'gYz...
14 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'hHU...
15 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'KEd...
16 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'tMf...
17 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'tot...
18 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'LNg...
19 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'har...
20 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'M6Y...
21 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'ZFb...
22 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'FD7...
23 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': '7fv...
24 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'H6d...
25 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'Z3y...
26 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': '3F8...
27 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'Hwu...
28 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': 'qJ2...
29 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': '2sK...
30 youtube#videoCategoryListResponse kBCr3I9kLHHU79W4Ip5196LDptI {'kind': 'youtube#videoCategory', 'etag': '3Ia...

Meta Data:

DATA PREPROCESSING

Extracting the Category names from json file

# create an empty list to store categories
categories = []

# iterate through each item in the 'items' column of the original DataFrame,
#'enumerate' is used to keep track of the index values of each item as the code loops through the 'items' column.
for i, item in enumerate(df1['items']):           # "for i, item in df1['items'].iteritems(): can also be used instead of enumerate"
    # extract the category name
    category = item['snippet']['title']
    # append the category name and its corresponding ID (i) to the list
    categories.append({'categoryId': i, 'category_name': category})

# create a new DataFrame from the list of categories
df_categories = pd.DataFrame(categories)

# print the new DataFrame
df_categories
categoryId category_name
0 0 Film & Animation
1 1 Autos & Vehicles
2 2 Music
3 3 Pets & Animals
4 4 Sports
5 5 Short Movies
6 6 Travel & Events
7 7 Gaming
8 8 Videoblogging
9 9 People & Blogs
10 10 Comedy
11 11 Entertainment
12 12 News & Politics
13 13 Howto & Style
14 14 Education
15 15 Science & Technology
16 16 Movies
17 17 Anime/Animation
18 18 Action/Adventure
19 19 Classics
20 20 Comedy
21 21 Documentary
22 22 Drama
23 23 Family
24 24 Foreign
25 25 Horror
26 26 Sci-Fi/Fantasy
27 27 Thriller
28 28 Shorts
29 29 Shows
30 30 Trailers

Merging the category into dataframe

data=df_categories.merge(df,on='categoryId')
data
categoryId category_name video_id title publishedAt channelId channelTitle trending_date tags view_count likes dislikes comment_count thumbnail_link comments_disabled ratings_disabled description
0 1 Autos & Vehicles 5WjcDji3xYc Honest Trailers | Avatar: The Last Airbender 2020-08-11T17:03:59Z UCOpcACMWblDls9Z6GERVi1A Screen Junkies 2020-08-12T00:00:00Z screenjunkies|screen junkies|honest trailers|h... 833369 50183 1120 4634 https://i.ytimg.com/vi/5WjcDji3xYc/default.jpg False False โ–บโ–บSubscribe to ScreenJunkies!โ–บ https://fandom....
1 1 Autos & Vehicles z5l8ovbw_6M Don't be a Tourist 2020-08-10T21:28:49Z UCDQBZcjYKP1J1Nu-Y0_D37Q Tabbes 2020-08-12T00:00:00Z drawing|humor|storytime animation|story|slice ... 1061892 117220 876 9311 https://i.ytimg.com/vi/z5l8ovbw_6M/default.jpg False False This one is for all you full time travelersEMA...
2 1 Autos & Vehicles yVdH3QacEXc Selena Gomez - This is the Year (Official Prem... 2020-08-10T16:32:06Z UCPNxhDvTcytIdvwXWAm43cA Selena Gomez 2020-08-12T00:00:00Z Selena Gomez|David Henrie|Dixie Dโ€™Amelio|Charl... 1523818 163684 2377 9845 https://i.ytimg.com/vi/yVdH3QacEXc/default.jpg False False Get your tickets here: https://thisistheyear.f...
3 1 Autos & Vehicles qQ8domUSU7M Fall Guys in a Nutshell 2020-08-07T16:00:24Z UCV6g95OBbVtFmN9uiJzkFqQ CircleToonsHD 2020-08-12T00:00:00Z Fall Guys in a Nutshell|Fall guys|fall|guys|vi... 1045901 71591 869 2734 https://i.ytimg.com/vi/qQ8domUSU7M/default.jpg False False I've never been THIS infuriated at a game THIS...
4 1 Autos & Vehicles PORP0q8nThs Getting Suspended In High School 2020-08-07T20:52:37Z UCRfg0SWjIHm_h95e4V8X5og Young Don The Sauce God 2020-08-12T00:00:00Z young don the sauce god|animations|animated|st... 741546 66330 523 4273 https://i.ytimg.com/vi/PORP0q8nThs/default.jpg False False No Risk. No Reward. Getting Suspended In High ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
196938 29 Shows F-kvFACZ5yE Denzel Washington Reveals the Aftermath of Wil... 2022-04-03T14:58:54Z UCjQbTcszB-gRhDByY9WhySw T.D. Jakes 2022-04-06T00:00:00Z denzel washington interview|discovering the de... 5037785 59136 0 17582 https://i.ytimg.com/vi/F-kvFACZ5yE/default.jpg False False During the 2022 International Leadership Summi...
196939 29 Shows 3qBcdN4BZhM A Ramadan Without Loneliness | Ramadan 2022 | ... 2022-03-30T19:26:51Z UCPLUqBXpM_YMFJltmBtuZTw Islamic Relief Canada 2022-04-06T00:00:00Z orphans|orphan sponsorship|Islamic relief orph... 82021 53 0 0 https://i.ytimg.com/vi/3qBcdN4BZhM/default.jpg True False This Ramadan 2022, Islamic Relief is continuin...
196940 29 Shows F-kvFACZ5yE Denzel Washington Reveals the Aftermath of Wil... 2022-04-03T14:58:54Z UCjQbTcszB-gRhDByY9WhySw T.D. Jakes 2022-04-07T00:00:00Z denzel washington interview|discovering the de... 5281932 62341 0 18241 https://i.ytimg.com/vi/F-kvFACZ5yE/default.jpg False False During the 2022 International Leadership Summi...
196941 29 Shows F-kvFACZ5yE Denzel Washington Reveals the Aftermath of Wil... 2022-04-03T14:58:54Z UCjQbTcszB-gRhDByY9WhySw T.D. Jakes 2022-04-08T00:00:00Z denzel washington interview|discovering the de... 5436102 63996 0 18418 https://i.ytimg.com/vi/F-kvFACZ5yE/default.jpg False False During the 2022 International Leadership Summi...
196942 29 Shows F-kvFACZ5yE Denzel Washington Reveals the Aftermath of Wil... 2022-04-03T14:58:54Z UCjQbTcszB-gRhDByY9WhySw T.D. Jakes 2022-04-09T00:00:00Z denzel washington interview|discovering the de... 5545806 65037 0 18514 https://i.ytimg.com/vi/F-kvFACZ5yE/default.jpg False False During the 2022 International Leadership Summi...

196943 rows ร— 17 columns

utube=data.copy()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 utube=data.copy()

NameError: name 'data' is not defined

Data Cleaning

Checking for any null values

utube.isna().sum()
categoryId              0
category_name           0
video_id                0
title                   0
publishedAt             0
channelId               0
channelTitle            0
trending_date           0
tags                    0
view_count              0
likes                   0
dislikes                0
comment_count           0
thumbnail_link          0
comments_disabled       0
ratings_disabled        0
description          4096
dtype: int64
utube.dropna(subset=['channelTitle'],inplace=True) #drops the na in the column 
utube.drop(['categoryId','video_id','channelId','thumbnail_link','description'],axis=1,inplace=True) #Drop the unwanted columns
utube.rename({ 'category_name':'category',
              'publishedAt':'published_at',    #renames the column name into readable form
              'channelTitle':'channel_title'
              },axis=1,inplace=True)
utube.head()
category title published_at channel_title trending_date tags view_count likes dislikes comment_count comments_disabled ratings_disabled
0 Autos & Vehicles Honest Trailers | Avatar: The Last Airbender 2020-08-11T17:03:59Z Screen Junkies 2020-08-12T00:00:00Z screenjunkies|screen junkies|honest trailers|h... 833369 50183 1120 4634 False False
1 Autos & Vehicles Don't be a Tourist 2020-08-10T21:28:49Z Tabbes 2020-08-12T00:00:00Z drawing|humor|storytime animation|story|slice ... 1061892 117220 876 9311 False False
2 Autos & Vehicles Selena Gomez - This is the Year (Official Prem... 2020-08-10T16:32:06Z Selena Gomez 2020-08-12T00:00:00Z Selena Gomez|David Henrie|Dixie Dโ€™Amelio|Charl... 1523818 163684 2377 9845 False False
3 Autos & Vehicles Fall Guys in a Nutshell 2020-08-07T16:00:24Z CircleToonsHD 2020-08-12T00:00:00Z Fall Guys in a Nutshell|Fall guys|fall|guys|vi... 1045901 71591 869 2734 False False
4 Autos & Vehicles Getting Suspended In High School 2020-08-07T20:52:37Z Young Don The Sauce God 2020-08-12T00:00:00Z young don the sauce god|animations|animated|st... 741546 66330 523 4273 False False
utube.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 196943 entries, 0 to 196942
Data columns (total 12 columns):
 #   Column             Non-Null Count   Dtype 
---  ------             --------------   ----- 
 0   category           196943 non-null  object
 1   title              196943 non-null  object
 2   published_at       196943 non-null  object
 3   channel_title      196943 non-null  object
 4   trending_date      196943 non-null  object
 5   tags               196943 non-null  object
 6   view_count         196943 non-null  int64 
 7   likes              196943 non-null  int64 
 8   dislikes           196943 non-null  int64 
 9   comment_count      196943 non-null  int64 
 10  comments_disabled  196943 non-null  bool  
 11  ratings_disabled   196943 non-null  bool  
dtypes: bool(2), int64(4), object(6)
memory usage: 16.9+ MB

It's advisable to check the datatype of each column and convert them back to their original datatypes if necessary.

utube['published_at'] = pd.to_datetime(utube['published_at']).dt.strftime('%Y-%m-%d')
utube['published_at'] = pd.to_datetime(utube['published_at'])
utube['trending_date'] = pd.to_datetime(utube['trending_date']).dt.strftime('%Y-%m-%d')
utube['trending_date'] = pd.to_datetime(utube['trending_date'])
utube['publish_month']=pd.to_datetime(utube['published_at']).dt.strftime('%b')
utube['publish_day']=pd.to_datetime(utube['published_at']).dt.day
utube.head()
category title published_at channel_title trending_date tags view_count likes dislikes comment_count comments_disabled ratings_disabled publish_month publish_day
0 Autos & Vehicles Honest Trailers | Avatar: The Last Airbender 2020-08-11 Screen Junkies 2020-08-12 screenjunkies|screen junkies|honest trailers|h... 833369 50183 1120 4634 False False Aug 11
1 Autos & Vehicles Don't be a Tourist 2020-08-10 Tabbes 2020-08-12 drawing|humor|storytime animation|story|slice ... 1061892 117220 876 9311 False False Aug 10
2 Autos & Vehicles Selena Gomez - This is the Year (Official Prem... 2020-08-10 Selena Gomez 2020-08-12 Selena Gomez|David Henrie|Dixie Dโ€™Amelio|Charl... 1523818 163684 2377 9845 False False Aug 10
3 Autos & Vehicles Fall Guys in a Nutshell 2020-08-07 CircleToonsHD 2020-08-12 Fall Guys in a Nutshell|Fall guys|fall|guys|vi... 1045901 71591 869 2734 False False Aug 7
4 Autos & Vehicles Getting Suspended In High School 2020-08-07 Young Don The Sauce God 2020-08-12 young don the sauce god|animations|animated|st... 741546 66330 523 4273 False False Aug 7
utube['publish_month'].unique()
array(['Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr',
       'May', 'Jun', 'Jul'], dtype=object)
utube['publish_day'].nunique()
31
utube.describe().T
count mean std min 25% 50% 75% max
view_count 196943.0 2.396671e+06 6.225579e+06 0.0 448551.0 925720.0 2099926.5 264407389.0
likes 196943.0 1.274638e+05 3.697903e+05 0.0 17868.0 43493.0 109655.5 16021548.0
dislikes 196943.0 1.592899e+03 9.533623e+03 0.0 0.0 0.0 787.0 879357.0
comment_count 196943.0 9.642997e+03 7.539491e+04 0.0 1137.0 2619.0 6257.0 6738536.0
publish_day 196943.0 1.557807e+01 8.783524e+00 1.0 8.0 15.0 23.0 31.0
 def convert_scientific_to_decimal(value):
    return round(float(value), 2)
utube['view_count'] = utube['view_count'].apply(convert_scientific_to_decimal)

1. Which categories of videos tend to receive the most views, likes, and comments?

# group the videos by category and calculate the average metrics
category_metrics = utube.groupby('category')[['view_count' ,'likes', 'dislikes', 'comment_count']].mean()
category_metrics.sort_values(by=['view_count', 'likes', 'dislikes', 'comment_count'],ascending=[False,False,False,False],inplace=True)
category_metrics=round(category_metrics)
category_metrics
view_count likes dislikes comment_count
category
Shows 3412230.0 174444.0 4253.0 4385.0
Comedy 2895858.0 181284.0 2000.0 18023.0
Foreign 2810967.0 145265.0 1636.0 7748.0
Autos & Vehicles 2442090.0 104692.0 929.0 7255.0
Shorts 2402510.0 99525.0 1421.0 5461.0
Drama 2273363.0 120202.0 2368.0 6066.0
Family 2094707.0 137317.0 1931.0 5491.0
Thriller 1697511.0 92165.0 729.0 4807.0
Anime/Animation 1624911.0 41009.0 631.0 3128.0
Sci-Fi/Fantasy 1480570.0 69848.0 1477.0 3673.0
Horror 1380213.0 19780.0 997.0 4295.0
Science & Technology 1378854.0 69046.0 568.0 3093.0
Music 1012436.0 46504.0 519.0 3355.0
Classics 1007103.0 54525.0 413.0 2729.0
# Create a list of metric names to plot
metric_names = ['view_count', 'likes', 'dislikes', 'comment_count']

# Loop through each metric and create a sorted bar chart
for metric in metric_names:
    # Sort the DataFrame by the current metric in descending order
    sorted_metrics = category_metrics.sort_values(metric, ascending=False)
    
    # Create a Bar trace object with the sorted values
    trace = go.Bar(x=sorted_metrics.index, y=sorted_metrics[metric])
    
    # Create the Figure object
    fig = go.Figure(data=[trace])
    
    # Add title and axis labels
    fig.update_layout(title='Average {} by Category'.format(metric.capitalize()),
                      xaxis_title='Category',
                      yaxis_title='Average {}'.format(metric.capitalize()))
    
    # Show the plot
    fig.show()

2. Which channel has the highest total number of views, likes, dislikes, and comments in the dataset?

utube.head()
category title published_at channel_title trending_date tags view_count likes dislikes comment_count comments_disabled ratings_disabled publish_month publish_day
0 Autos & Vehicles Honest Trailers | Avatar: The Last Airbender 2020-08-11 Screen Junkies 2020-08-12 screenjunkies|screen junkies|honest trailers|h... 833369.0 50183 1120 4634 False False Aug 11
1 Autos & Vehicles Don't be a Tourist 2020-08-10 Tabbes 2020-08-12 drawing|humor|storytime animation|story|slice ... 1061892.0 117220 876 9311 False False Aug 10
2 Autos & Vehicles Selena Gomez - This is the Year (Official Prem... 2020-08-10 Selena Gomez 2020-08-12 Selena Gomez|David Henrie|Dixie Dโ€™Amelio|Charl... 1523818.0 163684 2377 9845 False False Aug 10
3 Autos & Vehicles Fall Guys in a Nutshell 2020-08-07 CircleToonsHD 2020-08-12 Fall Guys in a Nutshell|Fall guys|fall|guys|vi... 1045901.0 71591 869 2734 False False Aug 7
4 Autos & Vehicles Getting Suspended In High School 2020-08-07 Young Don The Sauce God 2020-08-12 young don the sauce god|animations|animated|st... 741546.0 66330 523 4273 False False Aug 7
grouped_data=utube.groupby('channel_title')[['view_count' ,'likes', 'dislikes', 'comment_count']].mean()
grouped_data.sort_values(by='view_count',ascending=False,inplace=True)
grouped_data.head()
view_count likes dislikes comment_count
channel_title
CHANDAN ART ACADEMY 1.153215e+08 6.147769e+06 0.000000 40101.166667
Mv Ryhan 8.556066e+07 1.410929e+06 83015.615385 4991.692308
Dr.Harrsha Artist 8.338510e+07 5.499444e+06 0.000000 25629.333333
mingweirocks 7.760728e+07 1.800318e+06 98090.333333 7192.166667
FAMILY BOOMS 7.452598e+07 2.566466e+06 105765.333333 15481.666667
fig = go.Figure()

fig.add_trace(go.Bar(x=grouped_data.index[:5], y=grouped_data['view_count'], name='Views'))
fig.add_trace(go.Bar(x=grouped_data.index[:5], y=grouped_data['likes'], name='Likes'))
fig.add_trace(go.Bar(x=grouped_data.index[:5], y=grouped_data['dislikes'], name='Dislikes'))
fig.add_trace(go.Bar(x=grouped_data.index[:5], y=grouped_data['comment_count'], name='Comments'))

fig.update_layout(
    updatemenus=[
        dict(
            type='dropdown',
            buttons=[
                dict(label='Views',
                     method='update',
                     args=[{'visible': [True, False,False,False]}]),
                dict(label='Likes',
                     method='update',
                     args=[{'visible': [False, True,False,False]}]),
                dict(label='Dislikes',
                     method='update',
                     args=[{'visible': [False, False,True,False]}]),
                dict(label='Comments',
                     method='update',
                     args=[{'visible': [False, False,False,True]}])
            ],
            active=0,
            showactive=True
        )
    ]
)

fig.show()

3. Do the number of views, likes, dislikes, and comments for YouTube videos in the dataset have any relationship with each other? If there is a relationship, how strong is it and in what direction does it go?

# Create a correlation matrix
corr_matrix = utube[['view_count', 'likes', 'dislikes', 'comment_count']].corr()

# Create a heatmap using plotly.graph_objs
heatmap = go.Heatmap(
    z=corr_matrix.values,
    x=corr_matrix.index.values,
    y=corr_matrix.columns.values,
    colorscale="GreenS"
    
)

# Set the title of the plot
layout = go.Layout(
    title="Correlation Matrix Heatmap",
    autosize=False
    
)

# Create a figure and plot the heatmap
fig = go.Figure(data=[heatmap], layout=layout)

# Show the plot
fig.show()

    # Create a scatter plot of likes vs dislikes
    plt.scatter(x='dislikes', y='likes',data=utube)
    plt.title(f'Relationship between Likes and Dislikes ')
    plt.xlabel('Likes')
    plt.ylabel('Dislikes')
    plt.show()

4.What are the top 10 most commonly used tags in videos which has high views?

# Select videos with views greater than or equal to 1 million
high_views = utube[utube['view_count'] >= 1000000]

# Combine all tags from the selected videos into a single list
all_tags = high_views['tags'].str.split('|').tolist()
all_tags = [tag for tags in all_tags for tag in tags]

# Count the occurrence of each tag
tag_counts = pd.Series(all_tags).value_counts()


print(tag_counts.head(10))
[None]        16591
funny          5760
minecraft      3890
comedy         3766
challenge      2883
highlights     1617
vlog           1605
fun            1601
football       1590
tiktok         1579
dtype: int64

# Create a bar chart of the top 10 most commonly used tags
bar = go.Bar(
    x=tag_counts.head(10).index,
    y=tag_counts.head(10).values,
    marker=dict(color=tag_counts.head(10).values, colorscale='Viridis'),
)

# Set the layout of the chart
layout = go.Layout(
    title='Top 10 Most Commonly Used Tags in Videos with High Views',
    xaxis=dict(title='Tag'),
    yaxis=dict(title='Count'),
)

# Combine the chart and layout, and plot the chart
fig = go.Figure(data=[bar], layout=layout)
fig.show()
month=utube.groupby('publish_month')['view_count','likes'].sum().sort_values(by=['view_count','likes'],ascending=[False,False])
month
view_count likes
publish_month
Dec 4.949529e+10 2546164299
Mar 4.547636e+10 2394696620
Oct 4.520997e+10 2567250881
Jun 4.143550e+10 2064449171
Aug 3.951106e+10 2332750238
Sep 3.888670e+10 2275353118
Feb 3.729378e+10 1781207179
Jan 3.668217e+10 1927015428
Apr 3.643994e+10 1831073251
Nov 3.608132e+10 2076016412
May 3.509804e+10 1654724976
Jul 3.039737e+10 1652393100
fig = go.Figure()

# change the data being plotted
fig.add_trace(go.Bar(x=month.index, y=month['view_count'], name='Total Views'))
fig.add_trace(go.Bar(x=month.index, y=month['likes'], name='Total Likes'))

fig.update_layout(
    # change the labels of the dropdown buttons
    updatemenus=[
        dict(
            type='dropdown',
            buttons=[
                dict(label='Total Views',
                     method='update',
                     args=[{'visible': [True, False]},
                           {'title': 'Total Views and Likes'}]),
                dict(label='Total Likes',
                     method='update',
                     args=[{'visible': [False, True]},
                           {'title': 'Total Views and Likes'}])
            ],
            # change the initial button that is displayed
            active=1,
            showactive=True
        )
    ]
)

fig.show()

5. Which channels have the most videos?



# Group the data by channel_title and count the number of occurrences
channel_counts = utube.groupby('channel_title')['title'].count()

# Sort the channels by the number of trending videos in descending order
sorted_channels = channel_counts.sort_values(ascending=False)

# Plot the result on a horizontal bar graph
plt.barh(sorted_channels.index[:10], sorted_channels.values[:10])
plt.title('Top 10 Channels with Most Trending Videos')
plt.xlabel('Number of Trending Videos')
plt.show()

6. How does the day of the week of video publishing affect the number of views and comments?


# Convert the published_at column to datetime format
utube['published_at'] = pd.to_datetime(utube['published_at'])

# Extract the day of the week from the published_at column
utube['publish_day'] = utube['published_at'].dt.day_name()

# Group the data by the publish_day column and calculate the average views and comments
avg_views_comments = utube.groupby('publish_day')[['view_count']].mean()

# Plot the result on a bar graph
avg_views_comments.plot(kind='bar')
plt.title('Average Views and Comments by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Count')
plt.show()

7. Which channels have published videos that received the most views within a recent period of time, and what are the top 5 among them?

# Filter the data to only include videos published within the last week
one_week_ago = pd.Timestamp.now() - pd.Timedelta(days=7)
recent_videos = utube[utube['published_at'] >= one_week_ago]

# Sort the recent_videos dataframe by view count in descending order
sorted_videos = recent_videos.sort_values(by='view_count', ascending=False)
sorted_videos['trend_time']=(sorted_videos['trending_date']-sorted_videos['published_at']).dt.days


max_views_per_channel = sorted_videos.groupby('channel_title').agg({'title': 'first', 'view_count': 'max','trend_time': 'first'})
max_views_per_channel.sort_values(by=['view_count','trend_time'],ascending=[False,True],inplace=True)
max_views_per_channel
title view_count trend_time
channel_title
NBA #6 WARRIORS at #3 KINGS | FULL GAME 2 HIGHLIGH... 2132127.0 1
NBA on TNT โ€œWhat Was He Supposed To Do?โ€ | Inside Reacts ... 1162124.0 1
Skip and Shannon: UNDISPUTED Draymond Green ejected for stomping on Sabonis... 682492.0 1
Babish Culinary Universe Binging with Babish: Tyler's Bullsh*t from The... 650329.0 1
Bleacher Report Draymond Ejected After STEPPING On Sabonis ๐Ÿ˜ณ 627056.0 1
fantano Frank Ocean Flopped 528842.0 1
NCT NCT DOJAEJUNG ์—”์‹œํ‹ฐ ๋„์žฌ์ • 'Perfume' Performance Video 514476.0 1
Wendover Productions How Corporate Consolidation is Killing Ski Towns 465326.0 1
Practical Engineering East Palestine Train Derailment Explained 367154.0 1
Eddie Hall The Beast Reunited with Brian Shaw | World's Strongest M... 325737.0 1
NHL Edmonton Oilers falter in Game 1 | Kings @ Oil... 237337.0 1
SPORTSNET NHL Game 1 Highlights | Kings vs. Oilers - Apr... 233953.0 1
Nick Viall Freestyle with Love is Blindโ€™s Marshall Glaze ... 230990.0 1
OfflineTV & Friends gotta catch 'em all! 193400.0 1
# Create a list of metric names to plot
graph_names = ['view_count','trend_time']

# Loop through each metric and create a sorted bar chart
for graph in graph_names:
   
    
    # Create a Bar trace object with the sorted values
    trace = go.Bar(x=max_views_per_channel.index[:5], y=max_views_per_channel[graph][:5])
    
    # Create the Figure object
    fig = go.Figure(data=[trace])
    
    # Add title and axis labels
    fig.update_layout(title='Most viewed video for a channel withn a short span',
                      xaxis_title='Channel Name',
                      yaxis_title='No. of  {}'.format(graph.capitalize()))
    
    # Show the plot
    fig.show()